KAFKA-17747: Add compute topic and group hash #47

visz11 · 2025-10-14T08:55:38Z

User description

Delete this text and replace it with a detailed description of your change. The
PR title and body will become the squashed commit message.

If you would like to tag individuals, add some commentary, upload images, or
include other supplemental information that should not be part of the eventual
commit message, please use a separate comment.

If applicable, please include a summary of the testing strategy (including
rationale) for the proposed change. Unit and/or integration tests are expected
for any behavior change and system tests should be considered for larger
changes.

CodeAnt-AI Description

Add deterministic topic and group hashing to group coordinator

What Changed

Added a stable topic-hash function that produces a topic-level hash from a topic's id, name, partition count, and each partition's rack list (racks are sorted per partition). The hash changes when any of those fields change.
Added a group-hash function that orders topic names and combines their topic hashes into a single group-level hash. The group hash changes if topic hashes or topic ordering change.
Added unit tests that verify the topic hash is sensitive to changes in topic id, name, partition count, partition rack contents and order, and that group hash depends on the ordered list of topic hashes.
Updated import control to allow the hashing library used by the new code.

Impact

✅ Detect topic metadata changes
✅ Deterministic group hashes for metadata comparison
✅ Verified sensitivity to id/name/partition/rack and ordering changes

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

New Features
- Added deterministic hashing for topics and groups in the coordinator.
Tests
- Introduced unit tests validating hashing stability, ordering, and sensitivity to metadata changes.
Chores
- Added a hashing library to build configuration.
- Updated import rules to allow required hashing packages.

Signed-off-by: PoAn Yang <[email protected]>

codeant-ai · 2025-10-14T08:55:43Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

gemini-code-assist · 2025-10-14T08:55:58Z

Summary of Changes

Hello @visz11, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new hashing utilities within the group-coordinator module to deterministically compute hashes for Kafka topics and consumer groups. This functionality is crucial for maintaining consistent state representation and enabling efficient comparisons or lookups based on these hashes. The changes include adding the Guava library as a dependency and implementing the hashing logic, along with thorough unit tests to validate its correctness and determinism.

Highlights

New Dependency: The Google Guava library, specifically its hashing utilities, has been added as a new dependency to the group-coordinator module.
Topic Hashing Function: A new static method, computeTopicHash, has been introduced in the Group class. This method generates a deterministic Murmur3-128 hash for a Kafka topic, incorporating its ID, name, number of partitions, and the sorted rack assignments of its replicas.
Group Hashing Function: A new static method, computeGroupHash, has been added to the Group class. This function computes a combined hash for a consumer group by deterministically ordering and combining the hashes of its associated topics.
Comprehensive Unit Tests: A new test file, GroupTest.java, has been added with extensive unit tests to validate the correctness and determinism of both computeTopicHash and computeGroupHash under various scenarios, including changes in magic bytes, partition order, rack order, and different topic fields.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-10-14T08:56:19Z

Walkthrough

Added Guava as a dependency and import allowance. Introduced deterministic hashing utilities in Group for topics and groups using Murmur3, leveraging cluster and topic metadata. Added comprehensive unit tests validating hashing stability and sensitivity. Updated gradle dependency catalogs and versions, and checkstyle import controls.

Changes

Cohort / File(s)	Summary of Changes
Dependency and build configuration `build.gradle`, `gradle/dependencies.gradle`	Added Guava (com.google.guava:guava:33.4.0-jre) to versions and libs; included as an implementation dependency in the group-coordinator module.
Import control configuration `checkstyle/import-control-group-coordinator.xml`	Allowed imports for `com.google.common.hash` in coordinator and metrics subtrees.
Group hashing utilities `group-coordinator/src/main/java/.../Group.java`	Added static methods: `computeTopicHash(TopicImage, ClusterImage)` and `computeGroupHash(Map<String, Long>)`; added related imports.
Unit tests for hashing `group-coordinator/src/test/java/.../GroupTest.java`	New JUnit 5 tests covering deterministic hashing, ordering effects, rack composition, and image variations for topic and group hashes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my ears at hashes new,
Murmur3 winds the racks right through.
Sorted names, partitions too—
Deterministic trails I chew.
With Guava leaves, I crunch and play,
Group by group, I bound away.
Thump! The tests all pass today. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description still contains the unmodified placeholder template text and does not include a user‐authored summary of the changes, rationale, or testing strategy as required by the repository’s template.	Please replace the placeholder text with a detailed description of your change, including a summary of the implemented features, testing strategy, and any relevant context in accordance with the repository’s description template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title “KAFKA-17747: Add compute topic and group hash” succinctly identifies the main change of introducing topic‐level and group‐level hashing methods and aligns closely with the implemented code additions.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch clone-KAFKA-17747-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

refacto-visz · 2025-10-14T08:56:38Z

KAFKA-17747: Add compute topic and group hash

TL;DR: Adds hash computation utilities for topics and groups using Guava's Murmur3 hashing with comprehensive test coverage.

Refacto PR Summary

Implements deterministic hash computation methods for Kafka topics and consumer groups using Google Guava's Murmur3 algorithm.
Adds static methods to compute topic hashes based on metadata (ID, name, partitions, rack topology) and group hashes from topic hash collections. This PR introduces hash computation capabilities for Kafka's group coordinator module, enabling consistent fingerprinting of topic metadata and group compositions. The implementation uses Murmur3 128-bit hashing for topic fingerprints incorporating topic ID, name, partition count, and rack distribution. Group hashes combine ordered topic hashes using Guava's combineOrdered function. The changes include comprehensive unit tests validating hash consistency, field sensitivity, and ordering requirements.

Change Highlights

Click to expand

build.gradle: Adds Google Guava dependency to group-coordinator project
gradle/dependencies.gradle: Defines Guava 33.4.0-jre version and library reference
checkstyle/import-control-group-coordinator.xml: Allows com.google.common.hash package imports
Group.java: Implements computeTopicHash() and computeGroupHash() static methods using Murmur3
GroupTest.java: Comprehensive test suite validating hash computation correctness and consistency

Sequence Diagram

sequenceDiagram
    participant C as Coordinator
    participant G as Group
    participant T as TopicImage
    participant CI as ClusterImage
    participant H as Hasher
    
    C->>G: computeTopicHash(topicImage, clusterImage)
    G->>H: newHasher()
    G->>H: putByte(0) magic
    G->>T: getId(), getName(), partitions()
    T-->>G: topic metadata
    G->>H: putLong(id), putString(name), putInt(partitions)
    G->>CI: getBrokerRacks()
    CI-->>G: rack information
    G->>H: putInt(partitionId), putString(racks)
    H-->>G: hash.asLong()
    C->>G: computeGroupHash(topicHashes)
    G->>H: combineOrdered(sortedHashes)
    H-->>C: combined hash

Testing Guide

Click to expand

Hash consistency: Run GroupTest.testComputeTopicHash() to verify identical inputs produce identical hashes
Field sensitivity: Execute parameterized tests to confirm different topic IDs, names, partitions, or racks generate different hashes
Order independence: Test computeGroupHash() with different topic insertion orders, verify consistent output due to internal sorting
Magic byte validation: Verify hash changes when magic byte differs, ensuring version compatibility
Rack ordering: Confirm sorted rack strings produce deterministic hashes regardless of broker registration order

visz11 · 2025-10-14T08:56:47Z

/refacto-visz

gemini-code-assist

Code Review

This pull request introduces hashing functions for topics and groups using the Guava library, which is a valuable addition for tracking metadata changes. The implementation is sound, and the accompanying tests are thorough. I have a few suggestions to enhance the robustness and maintainability of the new code. My recommendations include using a more appropriate hashing method for UUIDs, simplifying a stream operation, and refactoring some duplicated code in the tests.

gemini-code-assist · 2025-10-14T08:57:37Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(topicImage.id().hashCode()) // topic Id


Using topicImage.id().hashCode() for hashing the Uuid is not ideal as it truncates the 128-bit UUID to a 32-bit integer hash, increasing the potential for collisions. Guava's Hasher provides a putUuid(Uuid) method that correctly uses the full 128 bits of the UUID. Using this method will make the hash more robust.

Please note that this change will require updating the corresponding logic in GroupTest.java.

Suggested change

.putLong(topicImage.id().hashCode()) // topic Id

.putUuid(topicImage.id()) // topic Id

gemini-code-assist · 2025-10-14T08:57:38Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+                .filter(Optional::isPresent)
+                .map(Optional::get)


The pattern .filter(Optional::isPresent).map(Optional::get) can be simplified to .flatMap(Optional::stream) since your project's Java version supports it. This change will make the stream pipeline more concise and idiomatic.

Suggested change

.filter(Optional::isPresent)

.map(Optional::get)

.flatMap(Optional::stream)

gemini-code-assist · 2025-10-14T08:57:38Z

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

+    @Test
+    void testComputeTopicHash() {
+        long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster());
+
+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+            .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
+            .putInt(FOO_NUM_PARTITIONS) // number of partitions
+            .putInt(0) // partition 0
+            .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0
+            .putInt(1) // partition 1
+            .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
+        assertEquals(topicHasher.hash().asLong(), result);
+    }


There is significant code duplication across the testComputeTopicHash* methods. The logic for building the hash is repeated in testComputeTopicHash, testComputeTopicHashWithDifferentMagicByte, testComputeTopicHashWithDifferentPartitionOrder, and testComputeTopicHashWithDifferentRackOrder. This makes the tests brittle and harder to maintain, as any change to the hashing logic in Group.computeTopicHash would require updates in multiple test methods.

Consider refactoring this logic into a private helper method within the test class. This would centralize the hash construction, making the tests cleaner and more maintainable.

refacto-visz · 2025-10-14T08:57:44Z

Refacto is reviewing this PR. Please wait for the review comments to be posted.

codeant-ai · 2025-10-14T08:58:24Z

checkstyle/import-control-group-coordinator.xml

            <allow pkg="org.apache.kafka.coordinator.common.runtime" />
            <allow pkg="com.google.re2j" />
            <allow pkg="org.apache.kafka.metadata" />
+            <allow pkg="com.google.common.hash" />


Suggestion: Narrow the permission surface by marking the package as exact-match to avoid accidentally allowing subpackages under com.google.common.hash. [security]

Suggested change

<allow pkg="com.google.common.hash" />

<allow pkg="com.google.common.hash" exact-match="true" />

Why Change? ⭐

The proposed change only adds the attribute exact-match="true" which is already used elsewhere in this import-control file (e.g. other <allow ... exact-match="true" /> lines), so the attribute is supported by the DTD referenced at the top of the file and will not introduce a syntax error.

This modification narrows the permission surface by allowing only the exact package com.google.common.hash and not its subpackages. It is syntactically valid XML and consistent with the surrounding entries. Assumptions:

The import-control DTD used by this project supports the exact-match attribute (evidence: other lines in this file use it).

No runtime code requires implicit access to subpackages under com.google.common.hash; if such access is required, this change would be a deliberate tightening of permissions rather than a syntactic bug.

Given these points, the change is safe from a syntax/execution perspective and does not introduce errors into the configuration itself.

codeant-ai · 2025-10-14T08:58:24Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(topicImage.id().hashCode()) // topic Id


Suggestion: Use a deterministic string representation of the topic id (e.g. id().toString()) instead of hashCode() when feeding the hasher to avoid collisions and ensure consistent hashing across JVMs. [possible bug]

Suggested change

.putLong(topicImage.id().hashCode()) // topic Id

.putString(topicImage.id().toString(), StandardCharsets.UTF_8) // topic Id as string

Why Change? ⭐

The improved code is syntactically correct and uses only symbols present in the file:

Hasher.putString(CharSequence, Charset) is a valid Guava Hasher API and StandardCharsets is already imported.

topicImage.id().toString() is a safe replacement for topicImage.id().hashCode(): it produces a deterministic, textual representation of the id (which is typically a UUID or equivalent), avoiding reliance on JVM-specific hashCode implementations and reducing collision risk.

The change does not introduce additional null-safety concerns beyond the original (both .hashCode() and .toString() would NPE if id() is null).
Assumptions and rationale:

topicImage.id() has a meaningful, stable toString() (which is true for UUIDs and Kafka Uuid types).

The change intentionally alters the bytes fed into the hasher to a deterministic textual form; this is a compatible internal change to the hashing input and compiles without additional imports.
Given these verifiable points and that the edit is local and uses existing imports/APIs, this suggestion is classified as verified.

refacto-visz · 2025-10-14T08:59:13Z

Code Review: Group Coordinator Hash Implementation

👍 Well Done

Comprehensive Test Coverage

Thorough unit tests validate hash sensitivity to all metadata changes

Deterministic Hash Implementation

Uses Murmur3 hashing with proper ordering for consistent results across environments

📁 Selected files for review (5)

build.gradle
checkstyle/import-control-group-coordinator.xml
gradle/dependencies.gradle
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

🎯 Custom Instructions

✅ Applied Instructions

Organization Guidelines

Avoid commented code in actual codebases

Hardcoding of variables

Scope: All files

📝 Additional Comments

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (7)

Hash Collision Risk

Using hashCode() on UUID introduces unnecessary hash collision risk in topic hash calculation. UUID already provides good distribution, but hashCode() reduces 128-bit UUID to 32-bit integer. Consider using UUID bytes directly or most/least significant bits for better hash quality.

Standards:

Algorithm-Correctness-Hash-Quality

Mathematical-Accuracy-Distribution

Null Safety Enhancement

Null parameter validation missing for topicImage and clusterImage inputs. NullPointerException during hash computation would cause coordinator service failures. Defensive null checks would prevent service disruption from invalid metadata states.

Standards:

ISO-IEC-25010-Reliability-Fault-Tolerance

DbC-Preconditions

ISO-IEC-25010-Functional-Correctness-Appropriateness

Stream Allocation Optimization

Stream pipeline creates intermediate sorted collection and maps to HashCode objects before final collection. Each map operation allocates new HashCode instances causing memory pressure during group hash computation. Consider pre-sizing collections or using primitive-optimized operations for better memory efficiency.

Standards:

ISO-IEC-25010-Performance-Efficiency-Resource-Utilization

Memory-Allocation-Optimization

Stream-Processing-Efficiency

Hash Collision Risk

Group hash computation combines multiple topic hashes without collision detection or validation. While Murmur3 has good distribution properties, combining many hashes increases collision probability. Consider implementing collision detection or using a more robust combining strategy for production systems handling large numbers of topics.

Standards:

CWE-328

NIST-SSDF-PW.1

Empty Rack Handling

When all brokers have no rack information, the racks string becomes empty. This could lead to identical hash contributions from partitions with different replica sets but no rack data. Consider adding partition replica IDs as fallback to maintain hash differentiation when rack information is unavailable.

Standards:

Algorithm-Correctness-Edge-Cases

Business-Rule-Hash-Uniqueness

Extract Hash Constants

Hash algorithm and magic byte are hardcoded within method implementation. Extracting these as named constants improves readability and enables easier algorithm updates. Consider static final fields for HASH_ALGORITHM and TOPIC_HASH_VERSION.

Standards:

Clean-Code-Constants

SOLID-OCP

Maintainability-Quality-Flexibility

Missing Input Validation

Hash computation methods accept parameters without null validation or input sanitization. Malformed or null inputs could cause unexpected behavior or exceptions during hash calculation. Adding input validation would improve robustness and prevent potential denial of service through malformed metadata.

Standards:

CWE-20

OWASP-A03

NIST-SSDF-PW.1

refacto-visz · 2025-10-14T08:59:17Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+                .map(Optional::get)
+                .sorted()
+                .collect(Collectors.joining(";"));
+            topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"


Missing Import Statement

Code uses Collectors.joining() but missing import for java.util.stream.Collectors. This will cause compilation failure when the new computeTopicHash method is invoked. The stream operation cannot resolve Collectors without proper import declaration.

import java.util.stream.Collectors;

Commitable Suggestion

Suggested change

topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"

import java.util.stream.Collectors;

Standards

Algorithm-Correctness-Dependency-Resolution

Logic-Verification-Compilation-Safety

refacto-visz · 2025-10-14T08:59:19Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(topicImage.id().hashCode()) // topic Id


Hardcoded Magic Byte

Magic byte value hardcoded as constant zero violates organization guideline against hardcoding variables. This reduces hash algorithm flexibility and prevents versioning of hash computation logic.

Standards

Org-Guideline-Hardcoding of variables

CWE-547

OWASP-A05

refacto-visz · 2025-10-14T08:59:20Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+            topicHasher.putInt(entry.getKey()); // partition id
+            String racks = Arrays.stream(entry.getValue().replicas)
+                .mapToObj(clusterImage::broker)
+                .filter(Objects::nonNull)
+                .map(BrokerRegistration::rack)
+                .filter(Optional::isPresent)
+                .map(Optional::get)
+                .sorted()
+                .collect(Collectors.joining(";"));
+            topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
+        });


Stream Processing Inefficiency

Nested stream processing creates multiple intermediate collections for each partition. Inner stream operations execute for every partition iteration causing O(n*m) complexity where n=partitions, m=replicas. Performance degrades with partition count and replica factor affecting hash computation scalability.

Standards

ISO-IEC-25010-Performance-Efficiency-Time-Behavior

Optimization-Pattern-Stream-Efficiency

Algorithmic-Complexity-Nested-Operations

refacto-visz · 2025-10-14T08:59:21Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(topicImage.id().hashCode()) // topic Id


Magic Byte Documentation

Magic byte value lacks documentation explaining its purpose and versioning strategy. Future hash algorithm changes may require different magic bytes for backward compatibility. Adding constant with descriptive name improves maintainability.

Standards

Clean-Code-Comments

Maintainability-Quality-Documentation

codeant-ai · 2025-10-14T08:59:44Z

Pull Request Feedback 🔍

🔒 No security issues identified
⚡ Recommended areas for review New third-party dependency The import-control now allows `com.google.common.hash`. Verify that the coordinator module declares an explicit, pinned dependency (e.g. Guava or the specific artifact providing that package) in the module build files and that the chosen version has been audited for compatibility and CVEs. Also confirm the dependency is available at runtime for the coordinator code that will use it. Build & CI impact Ensure CI/build scripts are updated so that the new import-control entry doesn't cause unexpected failures (missing dependency, classpath differences across test vs prod). Confirm unit/integration tests are run with the same dependency set used in production packaging. Map iteration order assumption The group-hash tests construct a `Map` using `Map.of(...)` and assume a specific ordering when building the expected `Hashing.combineOrdered(...)` input list. Iteration order for some map implementations is not guaranteed; ensure `Group.computeGroupHash`'s contract about ordering is explicit, and make the test construct an ordered collection (e.g. LinkedHashMap or ordered list) if ordering is important. Topic-id hashing The tests (and the manual expected-hash construction) use `Uuid.hashCode()` and feed that into `Hasher.putLong(...)`. Relying on `hashCode()` may be brittle or lossy compared to using the full UUID value; it may not uniquely represent the UUID bits and could lead to accidental collisions or mismatch with production code that encodes UUIDs differently. Hard-coded rack strings The expected hash is built using hard-coded rack serialization strings like `"rack0;rack1"`. This ties the test to a specific representation and ordering of racks produced by `MetadataImageBuilder.addRacks()` and can make the test fragile if the representation/ordering changes. Prefer deriving the expected rack string directly from the metadata object under test. Import-control scope Review whether allowing `com.google.common.hash` is narrowly scoped enough. Allowing a package can broaden the permitted API surface; ensure this entry is limited to the minimal package(s) required by the new hashing implementation and does not unintentionally permit unrelated Guava subpackages.

codeant-ai · 2025-10-14T08:59:46Z

CodeAnt AI finished reviewing your PR.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1)

119-160: Optional: add an invariance test for replica order.

Consider a test that swaps replica order for a partition but preserves racks; the hash should stay the same since racks are sorted.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed79212 and 36cb999.

📒 Files selected for processing (5)

build.gradle (1 hunks)
checkstyle/import-control-group-coordinator.xml (1 hunks)
gradle/dependencies.gradle (2 hunks)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (2 hunks)
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (3)

metadata/src/main/java/org/apache/kafka/image/ClusterImage.java (1)

ClusterImage (36-108)

metadata/src/main/java/org/apache/kafka/image/TopicImage.java (1)

TopicImage (37-92)

metadata/src/main/java/org/apache/kafka/metadata/BrokerRegistration.java (1)

BrokerRegistration (44-398)

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (2)

clients/src/main/java/org/apache/kafka/common/Uuid.java (1)

Uuid (32-196)

metadata/src/main/java/org/apache/kafka/image/MetadataImage.java (1)

MetadataImage (33-200)

🔇 Additional comments (7)

build.gradle (1)

1423-1424: Guava dep addition looks correct; ensure LICENSE update.

Dependency wiring is fine for group-coordinator. Please update LICENSE-binary to account for Guava and any transitive artifacts.

gradle/dependencies.gradle (2)

64-65: Version pin for Guava is fine; please confirm license artifacts.

Adding versions.guava is consistent. Ensure binary LICENSE files are updated accordingly.

151-152: libs mapping OK.

libs.guava definition is correct and matches the version entry.

checkstyle/import-control-group-coordinator.xml (1)

80-81: Import allowance scoped appropriately.

Allowing only com.google.common.hash is precise and aligned with usage.

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (1)

229-237: Deterministic group hash over sorted topics: LGTM.

Sorting by topic name and combining ordered 64-bit hashes is clear and stable.

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (2)

162-188: Group hash tests: LGTM.

They capture ordering sensitivity as intended.

50-65: Ignore suggested test update: Group.computeTopicHash still uses topicImage.id().hashCode(), so the expected hash in the test should remain based on .hashCode().

Likely an incorrect or invalid review comment.

coderabbitai · 2025-10-14T09:10:24Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java

+    static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {
+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(topicImage.id().hashCode()) // topic Id
+            .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
+            .putInt(topicImage.partitions().size()); // number of partitions
+
+        topicImage.partitions().entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(entry -> {
+            topicHasher.putInt(entry.getKey()); // partition id
+            String racks = Arrays.stream(entry.getValue().replicas)
+                .mapToObj(clusterImage::broker)
+                .filter(Objects::nonNull)
+                .map(BrokerRegistration::rack)
+                .filter(Optional::isPresent)
+                .map(Optional::get)
+                .sorted()
+                .collect(Collectors.joining(";"));
+            topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
+        });
+        return topicHasher.hash().asLong();
+    }


⚠️ Potential issue | 🔴 Critical

Use full 128-bit UUID; avoid ambiguous rack concatenation.

Bug: putLong(topicImage.id().hashCode()) collapses 128-bit UUID to 32-bit int, increasing collision risk. Use most/least bits.

Improvement: Joining racks with ";" can collide if rack strings contain ";" (e.g., "a;" + "b" vs "a" + ";b"). Encode lengths instead.

Apply this essential fix for UUID handling:

- Hasher topicHasher = hf.newHasher() - .putByte((byte) 0) // magic byte - .putLong(topicImage.id().hashCode()) // topic Id - .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name - .putInt(topicImage.partitions().size()); // number of partitions + Hasher topicHasher = hf.newHasher() + .putByte((byte) 0) // magic byte + // topic id: use full 128 bits to avoid collisions + .putLong(topicImage.id().getMostSignificantBits()) + .putLong(topicImage.id().getLeastSignificantBits()) + .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name + .putInt(topicImage.partitions().size()); // number of partitions

Optionally harden rack encoding to avoid separator ambiguity:

- String racks = Arrays.stream(entry.getValue().replicas) - .mapToObj(clusterImage::broker) - .filter(Objects::nonNull) - .map(BrokerRegistration::rack) - .filter(Optional::isPresent) - .map(Optional::get) - .sorted() - .collect(Collectors.joining(";")); - topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";" + List<String> racks = Arrays.stream(entry.getValue().replicas) + .mapToObj(clusterImage::broker) + .filter(Objects::nonNull) + .map(BrokerRegistration::rack) + .flatMap(Optional::stream) + .sorted() + .toList(); + topicHasher.putInt(racks.size()); + for (String rack : racks) { + topicHasher.putInt(rack.length()); + topicHasher.putString(rack, StandardCharsets.UTF_8); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 0) // magic byte

.putLong(topicImage.id().hashCode()) // topic Id

.putString(topicImage.name(), StandardCharsets.UTF_8) // topic name

.putInt(topicImage.partitions().size()); // number of partitions

topicImage.partitions().entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(entry -> {

topicHasher.putInt(entry.getKey()); // partition id

String racks = Arrays.stream(entry.getValue().replicas)

.mapToObj(clusterImage::broker)

.filter(Objects::nonNull)

.map(BrokerRegistration::rack)

.filter(Optional::isPresent)

.map(Optional::get)

.sorted()

.collect(Collectors.joining(";"));

topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"

});

return topicHasher.hash().asLong();

}

static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 0) // magic byte

// topic id: use full 128 bits to avoid collisions

.putLong(topicImage.id().getMostSignificantBits())

.putLong(topicImage.id().getLeastSignificantBits())

.putString(topicImage.name(), StandardCharsets.UTF_8) // topic name

.putInt(topicImage.partitions().size()); // number of partitions

topicImage.partitions().entrySet().stream()

.sorted(Map.Entry.comparingByKey())

.forEach(entry -> {

topicHasher.putInt(entry.getKey()); // partition id

// hardened rack encoding to avoid separator ambiguity

List<String> racks = Arrays.stream(entry.getValue().replicas)

.mapToObj(clusterImage::broker)

.filter(Objects::nonNull)

.map(BrokerRegistration::rack)

.flatMap(Optional::stream)

.sorted()

.toList();

topicHasher.putInt(racks.size());

for (String rack : racks) {

topicHasher.putInt(rack.length());

topicHasher.putString(rack, StandardCharsets.UTF_8);

}

});

return topicHasher.hash().asLong();

}

coderabbitai · 2025-10-14T09:10:25Z

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

+    @Test
+    void testComputeTopicHashWithDifferentMagicByte() {
+        long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster());
+
+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 1) // different magic byte
+            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+            .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
+            .putInt(FOO_NUM_PARTITIONS) // number of partitions
+            .putInt(0) // partition 0
+            .putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0
+            .putInt(1) // partition 1
+            .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
+        assertNotEquals(topicHasher.hash().asLong(), result);
+    }


🛠️ Refactor suggestion | 🟠 Major

Apply same UUID change here.

- .putByte((byte) 1) // different magic byte - .putLong(FOO_TOPIC_ID.hashCode()) // topic Id + .putByte((byte) 1) // different magic byte + .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB) + .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@Test

void testComputeTopicHashWithDifferentMagicByte() {

long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster());

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 1) // different magic byte

.putLong(FOO_TOPIC_ID.hashCode()) // topic Id

.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name

.putInt(FOO_NUM_PARTITIONS) // number of partitions

.putInt(0) // partition 0

.putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0

.putInt(1) // partition 1

.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1

assertNotEquals(topicHasher.hash().asLong(), result);

}

@Test

void testComputeTopicHashWithDifferentMagicByte() {

long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID),

FOO_METADATA_IMAGE.cluster());

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 1) // different magic byte

.putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)

.putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)

.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name

.putInt(FOO_NUM_PARTITIONS) // number of partitions

.putInt(0) // partition 0

.putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0

.putInt(1) // partition 1

.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1

assertNotEquals(topicHasher.hash().asLong(), result);

}

🤖 Prompt for AI Agents

In group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java around lines 67 to 82, the test builds a manual topic hasher using FOO_TOPIC_ID.hashCode(); update it to use the topic UUID components like the production code: replace the single putLong(FOO_TOPIC_ID.hashCode()) call with two calls putLong(FOO_TOPIC_ID.getMostSignificantBits()) and putLong(FOO_TOPIC_ID.getLeastSignificantBits()) so the test uses the UUID's actual bits when computing the expected hash.

coderabbitai · 2025-10-14T09:10:25Z

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+            .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
+            .putInt(FOO_NUM_PARTITIONS) // number of partitions
+            // different partition order
+            .putInt(1) // partition 1
+            .putString("rack1;rack2", StandardCharsets.UTF_8) // rack of partition 1
+            .putInt(0) // partition 0
+            .putString("rack0;rack1", StandardCharsets.UTF_8); // rack of partition 0
+        assertNotEquals(topicHasher.hash().asLong(), result);
+    }


🛠️ Refactor suggestion | 🟠 Major

Apply same UUID change here.

- .putByte((byte) 0) // magic byte - .putLong(FOO_TOPIC_ID.hashCode()) // topic Id + .putByte((byte) 0) // magic byte + .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB) + .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 0) // magic byte

.putLong(FOO_TOPIC_ID.hashCode()) // topic Id

.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name

.putInt(FOO_NUM_PARTITIONS) // number of partitions

// different partition order

.putInt(1) // partition 1

.putString("rack1;rack2", StandardCharsets.UTF_8) // rack of partition 1

.putInt(0) // partition 0

.putString("rack0;rack1", StandardCharsets.UTF_8); // rack of partition 0

assertNotEquals(topicHasher.hash().asLong(), result);

}

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 0) // magic byte

.putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)

.putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)

.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name

.putInt(FOO_NUM_PARTITIONS) // number of partitions

// different partition order

.putInt(1) // partition 1

.putString("rack1;rack2", StandardCharsets.UTF_8) // rack of partition 1

.putInt(0) // partition 0

.putString("rack0;rack1", StandardCharsets.UTF_8); // rack of partition 0

assertNotEquals(topicHasher.hash().asLong(), result);

🤖 Prompt for AI Agents

In group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java around lines 88 to 100, the test still uses FOO_TOPIC_ID.hashCode() when serializing the topic UUID; replace that single putLong(FOO_TOPIC_ID.hashCode()) with two explicit long writes for the UUID parts — putLong(FOO_TOPIC_ID.getMostSignificantBits()) followed by putLong(FOO_TOPIC_ID.getLeastSignificantBits()) — so the UUID is serialized consistently with the other changes.

coderabbitai · 2025-10-14T09:10:25Z

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java

+        HashFunction hf = Hashing.murmur3_128();
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+            .putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
+            .putInt(FOO_NUM_PARTITIONS) // number of partitions
+            .putInt(0) // partition 0
+            .putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0
+            .putInt(1) // partition 1
+            .putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
+        assertNotEquals(topicHasher.hash().asLong(), result);
+    }


🛠️ Refactor suggestion | 🟠 Major

Apply same UUID change here.

- .putByte((byte) 0) // magic byte - .putLong(FOO_TOPIC_ID.hashCode()) // topic Id + .putByte((byte) 0) // magic byte + .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB) + .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 0) // magic byte

.putLong(FOO_TOPIC_ID.hashCode()) // topic Id

.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name

.putInt(FOO_NUM_PARTITIONS) // number of partitions

.putInt(0) // partition 0

.putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0

.putInt(1) // partition 1

.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1

assertNotEquals(topicHasher.hash().asLong(), result);

}

HashFunction hf = Hashing.murmur3_128();

Hasher topicHasher = hf.newHasher()

.putByte((byte) 0) // magic byte

.putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)

.putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)

.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name

.putInt(FOO_NUM_PARTITIONS) // number of partitions

.putInt(0) // partition 0

.putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0

.putInt(1) // partition 1

.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1

assertNotEquals(topicHasher.hash().asLong(), result);

}

🤖 Prompt for AI Agents

In group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java around lines 106 to 117, the test currently uses FOO_TOPIC_ID.hashCode() when feeding the UUID into the hasher; change this to encode the full UUID like elsewhere by replacing the single putLong(FOO_TOPIC_ID.hashCode()) with two putLong calls that write FOO_TOPIC_ID.getMostSignificantBits() and FOO_TOPIC_ID.getLeastSignificantBits() so the complete UUID is used in the hashing.

KAFKA-17747: Add compute topic and group hash

36cb999

Signed-off-by: PoAn Yang <[email protected]>

github-actions bot added the build label Oct 14, 2025

codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Oct 14, 2025

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

codeant-ai bot reviewed Oct 14, 2025

View reviewed changes

refacto-visz bot reviewed Oct 14, 2025

View reviewed changes

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

visz11 closed this Oct 16, 2025

	.putLong(topicImage.id().hashCode()) // topic Id
	.putUuid(topicImage.id()) // topic Id

	.filter(Optional::isPresent)
	.map(Optional::get)
	.flatMap(Optional::stream)

	<allow pkg="com.google.common.hash" />
	<allow pkg="com.google.common.hash" exact-match="true" />

	.putLong(topicImage.id().hashCode()) // topic Id
	.putString(topicImage.id().toString(), StandardCharsets.UTF_8) // topic Id as string

	topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
	import java.util.stream.Collectors;

KAFKA-17747: Add compute topic and group hash #47

KAFKA-17747: Add compute topic and group hash #47

Conversation

visz11 commented Oct 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Retrigger review

Check Your Repository Health

Summary by CodeRabbit

Uh oh!

codeant-ai bot commented Oct 14, 2025

Thanks for using CodeAnt! 🎉

Uh oh!

gemini-code-assist bot commented Oct 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

refacto-visz bot commented Oct 14, 2025

KAFKA-17747: Add compute topic and group hash

Refacto PR Summary

Change Highlights

Sequence Diagram

Testing Guide

Uh oh!

visz11 commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

refacto-visz bot commented Oct 14, 2025

Uh oh!

codeant-ai bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

refacto-visz bot commented Oct 14, 2025

Code Review: Group Coordinator Hash Implementation

Uh oh!

refacto-visz bot Oct 14, 2025

Choose a reason for hiding this comment

Missing Import Statement

Uh oh!

refacto-visz bot Oct 14, 2025

Choose a reason for hiding this comment

Hardcoded Magic Byte

Uh oh!

refacto-visz bot Oct 14, 2025

Choose a reason for hiding this comment

Stream Processing Inefficiency

Uh oh!

refacto-visz bot Oct 14, 2025

Choose a reason for hiding this comment

Magic Byte Documentation

Uh oh!

visz11 commented Oct 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 14, 2025 •

edited

Loading